SVM

Before moving forward with the to-do list, let’s throw a Random Forest to it.

SVM

For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.

/home/runner/work/strom/strom/.venv/lib/python3.10/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning:

Liblinear failed to converge, increase the number of iterations.

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'wd-svm'
Version: 20251228T112626Z-ae806
⏩ stepit 'svm_raw': Starting execution of `strom.modelling.assess_model()` 2025-12-28 11:26:26

/home/runner/work/strom/strom/.venv/lib/python3.10/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning:



Liblinear failed to converge, increase the number of iterations.



⏩ stepit 'get_single_split_metrics': Starting execution of `strom.modelling.get_single_split_metrics()` 2025-12-28 11:26:26

✅ stepit 'get_single_split_metrics': Successfully completed and cached [exec time 0.0 seconds, cache time 0.0 seconds, size 1.0 KB] `strom.modelling.get_single_split_metrics()` 2025-12-28 11:26:26

♻️  stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-12-28 11:26:26

✅ stepit 'svm_raw': Successfully completed and cached [exec time 0.1 seconds, cache time 0.0 seconds, size 15.1 KB] `strom.modelling.assess_model()` 2025-12-28 11:26:26

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.253863 2.399957 2.684980 3.410023
MSE - Mean Squared Error 15.655275 20.449117 12.038683 27.480901
RMSE - Root Mean Squared Error 3.956675 4.522070 2.982166 5.136139
R2 - Coefficient of Determination 0.832035 0.784059 -2.236684 0.722929
MAPE - Mean Absolute Percentage Error 0.226225 0.241069 0.395428 0.309699
EVS - Explained Variance Score 0.833088 0.784757 0.532616 0.818702
MeAE - Median Absolute Error 1.362887 1.422177 2.555536 2.625422
D2 - D2 Absolute Error Score 0.674623 0.677733 -0.765698 0.520175
Pinball - Mean Pinball Loss 1.126932 1.199978 1.342490 1.705011

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

Well, not that bad, but it is overfitting quite a lot.

♻️  stepit 'grid_search_pipe': is up-to-date. Using cached result for `strom.modelling.grid_search_pipe()` 2025-12-28 11:26:30

Model Cards provide a framework for transparent, responsible reporting. 

 Use the vetiver `.qmd` Quarto template as a place to start, 

 with vetiver.model_card()

Writing pin:

Name: 'wd-svm'

Version: 20251228T112630Z-37e61
⏩ stepit 'svm_tuned': Starting execution of `strom.modelling.assess_model()` 2025-12-28 11:26:30

/home/runner/work/strom/strom/.venv/lib/python3.10/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.

⏩ stepit 'get_single_split_metrics': Starting execution of `strom.modelling.get_single_split_metrics()` 2025-12-28 11:26:30

✅ stepit 'get_single_split_metrics': Successfully completed and cached [exec time 0.0 seconds, cache time 0.0 seconds, size 1.0 KB] `strom.modelling.get_single_split_metrics()` 2025-12-28 11:26:30

♻️  stepit 'cross_validate_pipe': is up-to-date. Using cached result for `strom.modelling.cross_validate_pipe()` 2025-12-28 11:26:30

✅ stepit 'svm_tuned': Successfully completed and cached [exec time 0.1 seconds, cache time 0.0 seconds, size 15.0 KB] `strom.modelling.assess_model()` 2025-12-28 11:26:30

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.191867 2.286753 1.293039 2.436891
MSE - Mean Squared Error 15.723957 21.422647 2.980500 18.131235
RMSE - Root Mean Squared Error 3.965344 4.628461 1.645820 4.255542
R2 - Coefficient of Determination 0.831298 0.773778 0.090348 0.816841
MAPE - Mean Absolute Percentage Error 0.191461 0.190971 0.214997 0.194748
EVS - Explained Variance Score 0.832178 0.784748 0.505620 0.817841
MeAE - Median Absolute Error 1.216609 1.205211 1.094747 1.465526
D2 - D2 Absolute Error Score 0.683573 0.692934 0.149081 0.656816
Pinball - Mean Pinball Loss 1.095934 1.143376 0.646519 1.218445

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

TODOs